Three-phase hierarchical model-based and hybrid inference

Global commitments to mitigating climate change and halting biodiversity loss require reliable information about Earth's ecosystems. Increasingly, such information is obtained from multiple sources of remotely sensed data combined with data acquired in the field. This new wealth of data poses challenges regarding the combination of different data sources to derive the required information and assess uncertainties. In this article, we show how predictors and their variances can be derived when hierarchically nested models are applied. Previous studies have developed methods for cases involving two modeling steps, such as biomass prediction relying on tree-level allometric models and models linking plot-level field data with remotely sensed data. This study extends the analysis to cases involving three modeling steps to cover new important applications. The additional step might involve an intermediate model, linking field and remotely sensed data available from a small sample, for making predictions that are subsequently used for training a final prediction model based on remotely sensed data:• In cases where the data in the final step are available wall-to-wall, we denote the approach three-phase hierarchical model-based inference (3pHMB),• In cases where the data in the final step are available as a probability sample, we denote the approach three-phase hierarchical hybrid inference (3pHHY).


Method details
Background Assessment of state and change of forest biomass has become important worldwide due to the large carbon dioxide fluxes incurred by deforestation and afforestation, as well as by tree growth and harvesting in managed forests (e.g., [1] ). Intergovernmental Panel on Climate Change (IPCC) guidance specifies that parties must report changes in biomass carbon stocks as well as uncertainties of the reported figures. Since large parts of the world's forests are inaccessible, applying remotely sensed (RS) data is an important option for such assessments (ibid.). However, the accuracies of the assessments should be quantified using objective and consistent methods (e.g., [2] ).
A standard model-based approach to forest biomass assessment is to specify and estimate a model that links biomass measured in the field with RS metrics. The estimated model is then applied across the forest area of interest, whereby biomass predictions and their uncertainties can be obtained utilizing standard methods of model-based inference (e.g., [3] ). Uncertainties arise since a model cannot perfectly predict the actual biomass conditions. In some cases, several models are needed to make biomass assessments feasible. For example, models are often applied to predict the biomass of individual trees in the field from measurements of their diameters and heights. Subsequently, these predicted biomasses are used for developing models linking field biomass at the level of sample plots with RS metrics. In this case, the overall uncertainty arises from two modeling steps (e.g., [4] ).
Sometimes the available data structure requires additional levels in the hierarchy. New methods that utilize several RS-based models in hierarchical structures call for further development of methods to assess uncertainties. An example application involves modeling plot level biomass from field measurements, linking these with laser measurements through a second model, and utilizing laser-based predictions of biomass at sample locations as substitutes for field data in estimating a third model for biomass prediction based on RS data available wall-to-wall, such as data from the Landsat satellite [18] . In this case, three hierarchically nested modeling steps contribute to the overall uncertainty of the biomass prediction for the study area. Ideally, the different datasets involved should stem from about the same time point Hou et al. [19] .
Note the difference between this type of hierarchically nested models and the hierarchy of data and models applied in standard multi-level modeling (e.g., [20] ). In our case, the target response variable in the first modeling step is replaced by model predictions of the response variable in the subsequent modeling steps. In standard multi-level modeling, the response variable does not change, and multi-level modeling (also called mixed-effects modeling) refers to handling groups of observations, within which the observations are dependent.
This article aims to develop and demonstrate predictors and methods for uncertainty assessment for model-based inference involving three hierarchically nested modeling steps. If data in the final step are available wall-to-wall, we denote the method three-phase hierarchical model-based inference (3pHMB). If data in the last step are available from a probability sample, we denote the method three-phase hierarchical hybrid inference (3pHHY). The methods are similar in the first three steps, and differ only in the last step (see the Graphical Abstract).

Overview
In the following sections, we first derive and present the formulas needed for applying the 3pHMB and 3pHHY frameworks. Secondly, we provide further theoretical support for the approach through an analysis based on a multivariate superpopulation model; in this section, we also demonstrate important consequences of multi-stage hierarchical modeling. Lastly, we validate our theoretical results through Monte Carlo simulation.

Details of three-phase hierarchical model-based inference (3pHMB)
We base our description of 3pHMB inference on an example from forest inventory aiming at assessing the biomass for some large study area. We define our population as the grid-cells that tessellate the study area into units. The size of the grid-cells corresponds to the size of plots utilized in field inventories and to the size of pixels for which RS data can be retrieved. For each grid-cell RS data are available, e.g., multispectral optical satellite data from the Landsat satellite (e.g., [21 , 22] ). With the 3pHMB method, wall-to-wall RS data are used for predicting the biomass in each grid-cell and through averaging across the study area we predict the biomass density. All other data sources, described below, are utilized for providing data for estimating the parameters of this prediction model, which we denote the Landsat model.
When the 3pHMB method is applied, field data are not adequate for immediately estimating the parameters of the Landsat model, but "pseudo-field " biomass data are available from a sample of accurate predictions based on, e.g., airborne laser scanning (ALS) data. This sample dataset is used for estimating the Landsat model parameters. However, to make the ALS-based predictions, the parameters of the ALS model must also be estimated. This model is estimated from a small sample of field biomass assessments with corresponding ALS data.
The biomasses of trees in the field also need to be predicted from models (often called allometric biomass models, e.g., [23] ), since direct measurement is expensive and destroys the resource being monitored. Thus, biomass models are typically estimated from small samples of trees, which are cut down and carefully weighed during dedicated research studies (e.g., [24 , 25] ). When applying the 3pHMB method we often aggregate model predictions of tree-level biomass to plot level, e.g., using the method described in Saarela et al. [ [4] , pp. 11-12, Section "Aggregation of tree-level AGB predictions to plot level "] and in Varvia et al. [ [26] , Appendix A.1]. In this article, we do not address the details of this aggregation but assume that plot level biomass is either available from aggregation of tree-level predictions (based on field data) or directly predicted from plot level measurements in the field of, e.g., basal area and mean height.
Thus, our study area of interest (AOI) is tessellated into grid-cells that constitute the population elements. We assume that the objective is to predict the population mean, ̄ , of grid-cell level aboveground biomass density (AGBD), defined as where denotes the set of elements in the AOI. The RS data available for this set (e.g., Landsat data) are denoted , which is a matrix with observations of explanatory variables (plus a column of 1 ′ s for a model intercept). The dataset available for estimating the prediction model used in the final stage is denoted . It has sample elements and contains data of the kind described above (denoted for this set) and intermediate RS data (e.g. from ALS) denoted , which is a matrix with observations of explanatory variables (plus a column of 1 ′ s). Further, the dataset has elements and data in the datasets (as above, but with observations) and , which is a matrix with observations of ℎ variables (plus a column of 1 ′ s). The latter data are field measurements (such as basal area, and plot averages of tree diameters and tree heights). Finally, the dataset has observations. It contains field measurements in (as above, but with observations) and data from actual measurements of AGBD, in the vector . For deriving a formula for the variance of the 3pHMB method, we start by defining a univariate linear model F, which links AGBD and field measurements at the grid-cell level, using data from : where is the variable of interest (AGBD), is a ( ℎ + 1 ) -length row vector of explanatory variables (field measurements) with a unit term for the intercept, is a ( ℎ + 1 ) -length column vector of model parameters to be estimated, and is a variability (a.k.a. error) term which we assume follows a normal distribution with zero mean and constant variance 2 . 1 The model parameters are estimated using the ordinary least squares (OLS) estimator (e.g., [27] ): The estimated model is then used to predict the target variable (AGBD) for the elements in : The index is used to denote that the AGBD predictions in are based on the model F ( Eq. (1) ). The values ̂ are predictions of AGBD based on field data, which are subsequently used for estimating the parameters of the model predicting AGBD from ALS data. The ALS model is thus the next model to be addressed in the model hierarchy.
To support the construction of the ALS model, we specify a multivariate model, G * , which links the ( ℎ + 1 ) -multivariate response variable of field measurements and a ( + 1 ) -length row vector of ALS explanatory variables: where * is a ( ( + 1 ) × ( ℎ + 1 ) ) -matrix of model parameters to be estimated, * is a ( ℎ + 1 ) -length multivariate vector of variability terms, which are assumed to be independent, normally distributed, have zero mean, and their variance-covariance matrix * is of size ( ( ℎ + 1 ) × ( ℎ + 1 ) ). By multiplying each model component in G * by we obtain the univariate model ( [28] , Appendix A.3) Denoting * as , * as υ , and * as 2 , we obtain the following model, which we denote as G: Here, a ( + 1 ) -length row vector of ALS explanatory variables with a unit term for the intercept and is a ( + 1 ) -length column vector of model parameters to be estimated. This model links the expectation of AGBD, due to the model F, with ALS explanatory variables.
Here, ̂ * = ( ) −1 are estimated model parameters from the multivariate model G * . At this point, we have utilized the actual measurements of AGBD from to predict AGBD for all elements in based on field measurement data. The AGBD predictions for have subsequently been used for training an ALS model (the G model), through which we now can predict AGBD for : The predicted values ̂ are used for training the model to be applied in the last stage, based on wall-to-wall RS data, i.e., the Landsat model.
In support of constructing the Landsat model, we introduce a second multivariate model, which links the ( + 1 ) -multivariate variable (ALS variables) as a multivariate response variable with a ( + 1 ) -length row vector of Landsat explanatory variables ; the model is denoted Q * : where * is a ( ( + 1 ) × ( + 1 ) )-matrix of model parameters, * is a ( + 1 ) -multivariate variable of variability terms which are independent, normally distributed, have zero mean and variance-covariance * of size ( ( + 1 ) × ( + 1 ) ) . By multiplying model components in Q * with the model coefficients from G, we obtain the following univariate model Denoting * as , * as and * as 2 we obtain the following model, which we denote as Q: where, is an ( + 1 ) -length vector of model coefficients to be estimated. The ALS-based predictions of AGBD, ̂ , are used for estimating the model parameters, , using information on explanatory Landsat variables from : are estimated model parameters from the model Q * . Finally, the estimated Q model is used to predict the AGBD across the entire AOI, i.e., for all the elements in : The 3pHMB population mean predictor is then: where ̄ is the ( + 1 ) -length row vector of Landsat explanatory variable averages over the AOI . Table 1 summarizes the description of the datasets and the estimation steps. In the table, the datasets used in the analyses are listed with descriptions of available variables and whether they are used for model training or model application. Fig. 1 gives a graphical overview of 3pHMB prediction.

The 3pHMB predictor variance and its estimator
Application of 3pHMB for making predictions, only, across some AOI is fairly straightforward, and does not require all the model formalism from the previous section. What is required is a procedure where the AGBD is first measured on the elements in , then predicted in a stepwise manner for the elements in , , and . However, the formalism introduced in the previous section is needed for assessing the uncertainty associated with predicting the AGBD for the AOI. In this section, we build on the previous section and derive a formula for the variance of the predictor of AGBD, and its corresponding estimator. For deriving the 3pHMB predictor variance, we begin with its expectation: Since independent datasets (normally) are used for the three modeling steps, the estimated parameters in each step are independent and thus Thus, the expectation of the predictor of the population mean is Following the definition, the variance of ̂ It can be noted that the covariance of the estimated model parameters is at the core of this expression. Since several modeling steps have been involved in the estimation of the model parameters ̂, the covariance can be decomposed into the following terms using the law of total covariance [29] : where the step from the second last to the last line is obtained from However, ( ̂) in Eq. (6) emanates from several interacting models, and in order to present the variance formula in a format that would later on allow estimation of, ( ̂) is further decomposed using the law of total covariance: where the third term ( . Thus, the variance of the AGBD predictor can be expressed as: In the simulation analyses that we performed to validate our methodology (illustrated in Table 7 ), we found that all the terms involving traces were small, and could be neglected in the variance formula, for simplification. Thus, a good approximation of the variance of the AGBD predictor would be By replacing the covariances of estimated model parameters and the model parameters by their corresponding estimators, we obtain a variance estimator. In the simulation analysis that we performed to validate our methodology, it was found that the variance estimators were approximately unbiased.
If we would like to derive the MSE of the predictor rather than the variance, we may define the true AGBD over the AOI through the models involved. We first express the AGBD variable using the model Q as Using model G, we can express as − and thus − = + , leading to = + + .
Using model F, we can express as − and thus − = + + , which implies that the variable of interest for all elements in the AOI can be expressed using models F, G and Q as The AGBD mean for can thus be expressed as The MSE of the predictor ̂ 3 is then, according to the definition of MSE, Under an assumption of independence between the variability terms, the MSE can be expressed as However, note that it is not obvious that the three variability terms are independent, and typically there is also spatial autocorrelation among them. Thus, MSE expressions become complicated and require further investigation.

Details of three-phase hierarchical hybrid inference (3pHHY)
Next, we present our second case, i.e., when the last step auxiliary information is available only for a probability sample from the AOI, rather than wall-to-wall. This step is the only difference compared to 3pHMB inference. In our forest inventory example, the auxiliary information in this case could be retrieved from the ICESat-2 spaceborne LiDAR.
Thus, ̄ in this case is estimated using the design-based estimator (e.g., [30] , p. 42) where is a probability sample drawn from the AOI, is a vector of ICESat-2 auxiliary values for the ℎ element in the sample, and is the probability of including the ℎ element to the sample . The design-based expectation of ̂ is The 3pHHY predictor of the AGBD mean over the AOI is then (cf., [31][32][33] ) Fig. 2 gives a graphical overview of 3pHHY prediction.

The 3pHHY predictor variance and its estimator
To derive the variance of the 3pHHY predictor, we begin with the expectation of the predictor, which can be decomposed as where the subscript denotes expectation due to the design. The variance of the 3pHHY predictor can be decomposed using the law of total variance as ] . (11) The third term in the final expression of Eq. (11) is due to .
Our simulation analysis showed that the third term, [ (11) is small and may be neglected. Thus, for any given sampling design the 3pHHY variance is Under a simple random sampling without replacement design, the variance is where is the sample size, ( 1 − ) is the finite population correction factor, and ( ) is the population covariance of the ICESat-2 explanatory variables. In the simulations performed to validate the results, Eq. (13) was used, since simple random sampling was assumed.
By replacing the covariance of estimated model parameters, ( ̂) , the model parameters, , and the design-based covariance ( 1 ∑ ) with their corresponding estimators we obtain a variance estimator. Our simulation analysis showed that the variance estimator is approximately unbiased. If we would like to derive the MSE of the 3pHHY predictor, we may proceed along the route previously outlined for 3pHMB, i.e.
Like in the case of 3pHMB, this MSE formula assumes independence between the variability terms and absence of spatial autocorrelation. Thus, deriving a useful MSE expression is complicated and requires further investigation.

Study of hierarchical modeling in the context of a superpopulation model
The previous sections of this article have provided the formulas needed for applying 3pHMB and 3pHHY for prediction, and for assessing uncertainties. In this section, we study properties of hierarchical modeling that cannot be straightforwardly deduced from the formulas presented in the previous sections. We do this by specifying a superpopulation model with normally distributed random variables. Some of the non-standard models from the previous sections, e.g., Eqs. (2) and (3) , will now be derived through conditional expectation, thus validating the previously presented models. This section will also provide insights into effects of hierarchical modeling that are important to understand, such as the reduced variability of the response variable the more hierarchically nested models are added. Examples of the magnitude of such variability reduction are provided.
We assume that with each population element a multivariate random variable is associated, i.e., for the ℎ observation, the values ( , , , ) are outcomes from a multivariate random variable ( , , , ). The joint distribution of the random variables for the elements in the population defines a superpopulation model denoted ( [34] , p. 80).
The superpopulation model shapes the relationship between the variables in the population, and the observations linked to them. We assume the same type of population elements, variables and observations as previously, i.e. is the AGBD measured in the field, is some field plot measurements, such as average tree diameter (DBH), is some ALS variable, and some Landsat (in the case of 3pHMB) or ICESat-2 variable (in the case of 3pHHY). The values ( , , , ) are realizations from the corresponding multivariate random variable ( , , , ).
We assume a multivariate normal distribution, i.e. , , ) is a vector of superpopulation means, and is the superpopulation variance-covariance matrix with variances 2 ( ⋅) for each variable on the diagonal and covariances ( ⋅)( ⋅) between the variables on the off-diagonal elements. We assume that our multivariate variables are exchangeable, i.e., every permutation of the order of the population elements follows the same joint distribution ( [34] , p. 85).
We now derive the models previously presented Eqs. (1) , ( (2) , and (3) ) through the superpopulation model. We begin with describing the model linking AGBD and DBH, i.e., the model F. Following the joint distribution , the conditional probability density function of , given that = for the ℎ unit, is where , ( , ) is the joint probability density function of and , and ( ) is the marginal density function for . Under the normality assumption the conditional expectation of given = can be expressed as ( [35] , p. 72) where is the correlation between and , and ( ⋅) is the superpopulation standard deviation for the given variable. The variance of the conditional distribution is ( [35] , p. 72) The conditional distribution of , given that = for the ℎ observation is then Thus, in our example, the AGBD random variable associated with the ℎ population element has the expectation and variance 2 ; is the DBH random variable with expectation and variance 2 . However, for a fixed outcome from , there is a random subset of conditional on the fixed ; this subset is the variable | = with expectation | = + ( − ) and variance 2 | = 2 ( 1 − 2 ) . This corresponds to regressing on , which reveals how much information about is contained in an observation of ( [35] , p. 73). Therefore, using the conditional distribution, we can define our first model (F), which links AGBD and DBH as where is the variability term. By denoting 2 = 2 | , 0 = ( − ) and 1 = we can rewrite F in the form previously introduced, i.e.
and in linear algebra notation where = ( 0 , 1 ) and = ( 1 , ) . By substituting the variances and correlations in the parameter formulas with their corresponding estimators, we obtain the well-known formulas for parameter estimation from OLS regression. Similarly, to derive the model G ( Eq. (2) ), we introduce [ | ] , which is a random variable since we now condition on the random variable rather than some fixed outcome The expectation of [ | ] can be derived through the law of iterated expectations ( [27] , p. 132) The next step is to introduce the ALS variable, . Based on our superpopulation model, the conditional distribution of [ | ] given that = has the expected value (cf. Eq. (15) ) where, The variance is (cf. Eq. (16) ) The conditional distribution of [ | ] , given that = is then Thus, we can define the model G: Denoting model parameters 0 = ( − ) , 1 = , and the variance of the variability term as 2 = 2 | | , we can rewrite G as and in linear algebra notation and expectation (following the law of iterated expectations) The conditional distribution of given that the covariance and the variance is  Table 3 The decrease of the variance of the response variable in model G ( 2 2 ) for different correlations ( ) between the explanatory and response variables in model F.
Thus, the model Q is Denoting the model parameters as 0 = ( − ) and 1 = , the variance of the variability term as and in linear algebra notation as ) , = ( 0 , 1 ) and = ( 1 , ) , which coincides with Eq. (3) in the previous section.

Decreased variability of the response variable
Deriving the models through conditional expectation reveals properties of 3pHMB (and 3pHHY) that cannot be immediately observed from the first section of this article. An important property to note is that as more modeling steps are included, the variability of the response variable decreases. This decreased variability (of proxy AGBD in our example) may have important implications in applications. One obvious case is if the objective is to map the AGBD distribution across a landscape. In case several hierarchically nested models have been utilized, the mapped AGBD variability in the landscape would be substantially smaller than the real variability. A potential solution to avoiding such decreased variability could be to apply calibration (e.g., [36] ).
The decreased variability may also have negative implications if the objective is to predict the AGBD, not least for domains. As pointed out by Chambers and Clark [37] , a means to minimize the uncertainty of model-based predictors is to ensure first-order balanced samples, meaning that the mean values of explanatory variables in the dataset used for model estimation coincide with the mean values of the explanatory variables in the target population. Although not specifically studied in this article, we hypothesize that first-order balanced sampling becomes increasingly important the more modeling steps are included. Further, whereas the most relevant uncertainty measure in connection with model-based inference would be the MSE, several studies suggest that for large areas, the relative difference between MSE and variance would be small (e.g., [33,38] ) in case model-unbiased predictors are applied. However, in case the variability of the response variable is substantially reduced, it remains to be evaluated if this assertion holds.
Thus, model-based inference should be applied with caution in case the variability of the (proxy) response variable is substantially reduced compared to the real-world variability of the response variable. Below, we demonstrate how the variance of the response variable is affected by multi-step modeling (the models F, G and Q). Table 2 gives an overview of the models and the corresponding expectation and variance of the response variable, under the previously introduced superpopulation model.
From Table 2 , we can see that whereas the response variables have the same expectation, their variability around the expectation is not the same. With the increased hierarchy of conditions, the variability of the proxy target variable is decreasing in comparison to the variability of the actual target variable.
In Tables 3 and 4 we show numerical examples of the reduction of the variance of the response variable. In Table 3 , this is shown for model G, i.e., for the case of two modeling steps.
From Table 3 we can see that, e.g., if the correlation between the response and explanatory variables is 0.8, the variance of the response variable in model G is 36% smaller than the original variance of the response variable; if the correlation is 0.7, about half of the variability is lost. Table 4 gives numerical examples for the case of three modeling steps. Now, the decreased variance of the response variable is further accentuated. For example, when both correlations are 0.8, 59% of the original variance is lost; if both are 0.7, the corresponding figure is 76%.  The input data mimics the conditions of the empirical case study presented in Varvia et al. [26] . Note that, although < < < < in this case, the presented methodological framework does not require any specific dataset size relations. Table 4 suggests that even if the correlation between explanatory variables and the response variable is fairly strong in the first two modeling steps, the last model is estimated using a response variable with substantially reduced variability compared to the response variable it is mimicking.

Methods validation
We validated the correctness of the presented variance formulas for 3pHMB and 3pHHY prediction through Monte Carlo (MC) simulation. A real-world application of 3pHHY is presented by Varvia et al. [26] .
An R Markdown file is available as supplementary material to this article. It provides an R code for the simulation with a step-bystep description of how the simulations were performed. The input information mimics boreal forest conditions in the northern part of Finland. The target variable is the AGBD ( ). The explanatory variable for model F is DBH ( ) ; in model G it is an ALS variable related to vegetation height ( ), and in model Q, it is an ICESat-2 variable related to vegetation height ( ). The superpopulation means ( ( ⋅) ) and standard deviations ( ( ⋅) ) of the variables and the correlations between them are presented in Table 5 . The sizes of the datasets are also given in the table.
The evaluation of the performance of the proposed predictors and variance formulas was conducted following the steps described below.
• Step 1a: Using the input information on the superpopulation means and standard deviations, the "true " model parameters and variances of variability terms in models F, G * , G, Q * and Q were obtained following the theoretical outline presented in the section "Study of hierarchical modeling in the context of a superpopulation model ". • Step 1b: The explanatory variables ( for model F over the dataset , for model G over the dataset , for model Q over the dataset , and over ) were generated based on their mean values and standard deviations following normal distributions.
• Step 1c: The variances of the 3pHMB and 3pHHY predictors were computed according to Eqs. (8a) and (11) based on the results from steps 1a and 1b.
Steps 1a -1c are preparations for the MC simulation, but located outside the MC loop, since they are repeated only once. The MC iterations included the following steps: • Step 2a: The variability terms for the models F, G * , G, Q * and Q were generated randomly based on Step 1a. As a consequence, the response variables for models were also generated for all elements in the datasets.

•
Step 2b: The model parameters of the models F, G * , G, Q * and Q were estimated based on the simulated data, and applied for predicting the AOI population mean following 3pHMB inference. The predicted value was recorded for each MC iteration. • Step 2c: A simple random probability sample (without replacement) was drawn from and the AOI population mean was predicted following 3pHHY inference. Note that the predicted value was recorded for each MC iteration.
• Step 2d: Variance estimators were applied to estimate the variance of the 3pHMB and 3pHHY predictors based on the simulated outcomes from each MC iteration. The estimates were recorded for each MC iteration.

Table 6
The variance of the 3pHMB and 3pHHY predictors and their corresponding estimated variances; ( ̂ ( ⋅) ) is the variance of the population mean predictor based on Eqs. (8a) and (11) , ( ̂ ( ⋅) ) is the empirical variance of the population mean predictor from the MC iterations, and ̂ ( ̂ ( ⋅) ) is the average of estimated variances across the MC iterations. 3pHMB 3pHHY 10.96 10.96 11.03 11.04 11.04 Table 7 The contribution of the terms involving traces to the overall variance of 3pHMB and 3pHHY predictors. Steps 2a -2d were repeated one million times. Based on the MC simulations, the empirical variances of the 3pHMB and 3pHHY predictors were obtained and could be compared with the variances according to the results from Eqs. (8a) and (11) as well as the corresponding variance estimators.
The results of the simulations are presented in Tables 6 and 7 . In addition, it was observed that the average of the predicted AOI population means over the MC iterations based on the 3pHMB and 3pHHY predictors corresponded well with the AGBD superpopulation mean, which shows that the predictors are approximately unbiased. Table 6 shows the results from comparing the analytically derived variances of the predictors with the corresponding empirical variances from the MC iterations, and the average values of estimated variances. The Table shows that Eqs. (8a) and (11) are valid as variance formulas and that the corresponding variance estimators are approximately unbiased.
Referring to the Eqs. (8a) and ( 8b ), it was suggested that terms with traces in (8a) could be neglected due to their small contribution to the overall variance of the 3pHMB and 3pHHY predictors. The empirical results in Table 7 validate this assertion.

Conclusions
The main purpose of this study was to present predictors, variances, and variance estimators for hierarchical model-based inference and hierarchical hybrid inference involving three modeling steps. Previous studies have proposed solutions for methods involving two modeling steps (e.g., Saarela et al. [39] ). The need for methods of this kind emanates from forest resources assessment utilizing multiple sources of remotely sensed data in combination with field data. In Monte Carlo simulations, the correctness of the proposed formulas was validated. In addition, it was shown that caution should be exercised when applying several modeling steps based on models with weak correlation between the explanatory variable(s) and the response variable, since multiple modeling steps dramatically reduce the variability of the predicted target variable, which may lead to problems in some applications. For example, a map produced on the basis of multiple modeling steps would display substantially reduced variability for the target variable compared to the real variability.
Although this article has focused on applications based on remotely sensed data, the methods are general and can be applied in any discipline where empirical data are lacking but proxies can be obtained through hierarchically nested models.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The data were simulated inside the validation code provided by the authors. An R Markdown file is attached as supplementary material.